Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger
نویسنده
چکیده
Text mining is the process of discovering information in large text collections, and automatically identifying interesting patterns and relationships in textual data. It is a relatively new research area, which has recently raised much interest among the research and industry communities, mainly due to the continuously increasing amount of information available on the Web and elsewhere. Text mining is a highly interdisci-plinary research area, bringing together research insights from the fields of data mining, natural language processing, machine learning, and information retrieval. In particular, text mining is closely related to the older area of data mining, which targets the extraction of interesting information from data records, although text mining is allegedly more difficult, as the source data consists of unstructured collections of documents rather than structured databases. The book by Feldman and Sanger is a thorough introduction to text mining, covering the general architecture of text mining systems, along with the main techniques used by such systems. It addresses both the theory and practice of text mining, and it illustrates the different techniques with real-world scenarios and practical applications. It is particularly relevant for students and professional practitioners, being structured as a self-contained handbook that does not require previous experience in any of the research fields involved. The book is structured into twelve chapters, which gradually introduce the area of text mining and related topics, starting with an introduction to the task of text mining, and ending with examples of practical applications from three different domains. The first chapter can be regarded as an overview of the book. It starts by defining the problem of text mining and the key elements in text mining: the document collections, the document features (words, terms, and concepts), and the role of background knowledge in text mining. It then briefly touches upon the possible applications of text mining, such as pattern discovery and trend analysis, and shortly discusses the interface layer of text mining systems. The second part of the chapter lays down the general architecture of a text mining system, which also serves as a rough guide for the rest of the book, as it describes the main components of a text mining system that are described in detail in subsequent chapters. Chapter 2 is one of the longest chapters in the book, and also one of the most dense in terms of newly introduced concepts. Despite being a more difficult read compared …
منابع مشابه
The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data
Text mining is a new and exciting area of computer science research that tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. Similarly, link detection – a rapidly evolving approach to the analysis of text that shares and builds on many of the key elements of te...
متن کاملJALDA's Interview with Professor James P. Lantolf
James P. Lantolf is George and Jane Greer Professor Emeritus of Language Acquisition and Applied Linguistics and former director of the Center for Advanced Language Proficiency Education and Research at the Pennsylvania State University, USA. He is currently Adjunct Professor of Applied Linguistics in the same academic unit at Xi’an Jiaotong University. He is founder of the Sociocultural Theory...
متن کاملText Mining at the Term Level
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. Previous work in text mining focused at the...
متن کاملWeb - Based Text Mining of Hotel Customer Comments Using SAS ® Text Miner and Megaputer Polyanalyst ®
This paper presents text mining using SAS® Text Miner and Megaputer PolyAnalyst® specifically applied for hotel customer survey data, and its data management. The paper reviews current literature of text mining, and discusses features of these two text mining software packages in analyzing unstructured qualitative data in the following key steps: data preparation, data analysis, and result repo...
متن کاملCircle Graphs: New Visualization Tools for Text-Mining
The proliferation of digitally available textual data necessitates automatic tools for analyzing large textual collections. Thus, in analogy to data mining for structured databases, text mining is defined for textual collections. A central tool in text-mining is the analysis of concept relationship, which discovers connections between different concepts, as reflected in the collection. However,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008